CUDAプログラミングガイド：スループット志向型計算への移行

計算は根本的な変化を経験しており、 レイテンシ最適化された CPUの設計から スループット志向型の GPUアーキテクチャへと移行しています。CPUは一つの荷物に速く届ける高速配送バイクに例えられる一方、GPUは単体では遅いですが、一度に5万個のコンテナを運ぶ巨大な貨物船です。

1. レイテンシとスループット

CPUは高度な分岐予測技術を用いて、一連の命令の「完了までの時間」を最小限に抑えるように設計されています。逆に、 グラフィックス処理ユニット（GPU） 数千のスレッドを並列で実行することで「1秒あたりの作業量」を最大化するように設計されており、単一スレッドの速度を犠牲にして総合的なスループットを高めています。

2. トランジスタの割当

GPUは、同程度の価格と電力消費環境下で、CPUよりもはるかに高い命令スループットとメモリ帯域幅を提供します。GPUは高度に並列な計算に特化しており、より多くのトランジスタを データ処理ユニット（ALU）に割り当てます。一方、CPUはデータキャッシュやフロー制御に多くのトランジスタを割り当てます。

3. CUDAの進化

コンピューティング統合デバイスアーキテクチャ（CUDA） は2006年にNVIDIAが導入しました。これは、グラフィックスAPIに依存せずにGPUの力を活用し、パフォーマンスを劇的に向上させるための並列計算プラットフォームおよびプログラミングモデルです。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

Which component consumes the majority of silicon real estate in a traditional CPU?

Arithmetic Logic Units (ALUs)

Control logic and Data Caching

Floating Point Units

Memory Controllers

QUESTION 2

What was the original purpose of the GPU before CUDA?

General purpose scientific computing

Operating system kernel management

Fixed-function hardware for 3D rendering

High-frequency trading

QUESTION 3

In the cargo ship analogy, what represents the 'Throughput'?

The speed at which the ship moves across the ocean.

The total volume of containers delivered at once.

The size of the ship's engine.

The fuel efficiency per container.

QUESTION 4

What is the primary trade-off made by GPUs to achieve high aggregate throughput?

Higher power consumption per unit.

Lower single-thread performance.

Reduced memory bandwidth.

Simplified mathematical precision.

QUESTION 5

Which NVIDIA software component is required to run CUDA applications?

DirectX 12

NVIDIA Driver and CUDA Toolkit

OpenGL Wrapper

Windows GDI+